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Abstract —  Point  positioning  over  the  Earth 's  surface  has 
become  simpler  after  the  advent  of  positioning  systems 
using  artificial  satellites.  Nowadays,  the  satellites 
constellations  ofGNSS  are  GPS  and  GLONASS,  the  most 
structured  systems,  however,  other  systems  were  built  to 
integrate  the  GNSS  in  last  years.  There  are  different 
methods  to  perform  precise  positioning  using  the  data 
transmitted  by  GNSS  satellites  and  the  PPP  method  is  one 
of  these.  Similarly  to  others,  the  PPP  uses  the  observables 
to  produce  the  coordinates  and  precise  them.  As  we  know, 
precision  is  different  from  accuracy.  While  precision 
informs  the  data  set  quality,  accuracy  tells  us  how  much 
the  coordinate  is  close  to  its  real  position  on  the  ground. 
Although  the  correlation  between  precision  and  accuracy 
correlation  is  implicit  in  the  observables,  the  processing 
methods  cannot  achieve  it.  The  purpose  of  this  study  was 
to  identify  this  relationship  using  the  data  mining  tool 
known  as  Decision  Tree.  The  creation  of  a  large  set  of 
coordinates  with  known  precision  and  accuracy  were 
necessary  for  the  recursive  training  of  the  Decision  Tree, 
which  became  able  to  predict  the  coordinates’ accuracy 
using  only  its  precision  abstract  should  summarize  the 
content  of  the  paper.  Try  to  keep  the  abstract  below  250 
words.  Do  not  make  references  nor  display  equations  in 
the  abstract.  The  journal  will  be  printed  from  the  same- 
sized  copy  prepared  by  you.  Your  manuscript  should  be 
printed  on  A4  paper  (21.0  cmx  29.7  cm).  It  is  imperative 
that  the  margins  and  style  described  below  be  adhered  to 
carefully.  This  will  enable  us  to  keep  uniformity  in  the 
final  printed  copies  of  the  Journal.  Please  keep  in  mind 
that  the  manuscript  you  prepare  will  be  photographed 
and  printed  as  it  is  received.  Readability  of  copy  is  of 
paramount  importance. 

Keywords —  GNSS,  PPP,  Decision  Tree,  Precision, 
Accuracy. 


I.  INTRODUCTION 

The  evolution  of  global  artificial  satellite  navigation 
systems  that  integrate  the  Global  Navigation  Satellite 
Systems,  or  simply  GNSS,  has  been  happening  regularly 
and  steadily  over  the  past  decade  and  leads  us  to 
understand  that  in  a  short  time  the  world  will  reach  a  new 
stage  in  positioning  of  points  using  artificial  satellites. 
Among  these  systems,  the  Global  Positioning  System 
(GPS)  is  in  a  more  advanced  stage,  finalizing  its 
modernization  with  the  planned  launch  of  Block  III 
satellites  and  other  investments  in  land  infrastructure.  The 
Russian  Global  Navigation  Satellite  System  (GLONASS) 
is  in  the  final  stages  of  completing  its  constellation,  while 
the  European  GALILEO  system  and  the  Chinese 
Compass  Navigation  Satellite  Experimental  System  or 
Beidou-1  are  in  intermediate  stages  of  deployment. 

In  this  context  of  novelties,  with  the  consequent 
enlargement  of  horizons,  some  points  still  deserve  to  be 
researched,  since  they  belong  more  to  the  fundamental 
technique  applied  in  the  positioning  of  points  than  to  a 
particular  positioning  system  The  relationship  between 
precision  and  accuracy  of  a  positioning  is  the  subject 
addressed  in  this  paper,  investigated  from  data  observed 
with  dual  frequency  GNSS  receivers. 

The  objective  of  this  paper  was  to  study  the  accuracy  of 
coordinates  obtained  by  the  Precision  Point  Positioning 
(PPP)  method  and  the  feasibility  of  using  them  in 
engineering  works  that  require  good  accuracy.  To 
understand  accuracy  behavior,  the  PPP  processing  results 
obtained  over  a  period  of  six  months  were  analyzed 
taking  into  account  the  different  sources  of  error  that  act 
on  the  propagated  signal  and  cause  deviations  above  the 
limits  acceptable  for  engineering  purposes. 

In  this  project,  the  machine  learning  technique  was 
applied.  This  technique  uses  a  database  populated  with 
the  known  accuracies  and  precision  of  a  set  of  previously 
measured  point  to,  by  computational  training,  induce  a 
Decision  Tree  and  make  it  capable  of  estimating  the 
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accuracy  of  a  new  positioning  in  which  only  the  precision 
is  known. 

Different  methods  of  observation  can  be  developed  by 
using  signal  receivers  transmitted  by  the  satellites  that 
make  up  the  constellations  that  integrate  the  GNSS.  These 
methods  produce  the  geodesic  coordinates  of  points,  with 
different  precisions,  practically  on  the  entire  physical 
surface  of  the  Earth.  Among  them,  the  absolute  method 
known  as  Precision  Point  Positioning  (PPP)  allows 
precise  positioning  using  only  one  receiver  to  record  the 
carrier  phase  data  transmitted  by  the  satellites  and  then 
process  them  in  combination  with  accurate  ephemeris 
provided  by  the  International  GNSS  Service  (IGS).  This 
is  a  very  useful  method  for  determining  coordinates  of 
points  that  are  far  from  a  terrestrial  reference  network. 

Since  it  is  an  absolute  method,  PPP  does  not  connect  to 
the  existing  terrestrial  geodesic  networks  in  the  studied 
region  and,  therefore,  the  coordinates  determined  with  its 
use  do  not  have  the  adjustment  residuals  of  an  existing 
terrestrial  geodesic  network.  It  can  be  said  that,  using 
PPP,  each  point  determined  is  an  independent  point  that 
has  its  own  accuracy.  However,  jobs  that  will  use  the 
coordinates  of  that  point  will  certainly  make  their 
connection  to  existing  terrestrial  geodesic  networks, 
which  can  be  a  problem  if  their  accuracy  is  not  adequate. 

The  Brazilian  Institute  of  Geography  and  Statistics 
(IBGE)  has  a  PPP  Service  available  online  (IBGE-PPP), 
which  processes  the  GNSS  data  and  provides  the 
coordinates  of  a  point  measured  using  dual  frequency 
receivers.  These  coordinates  are  linked  to  the  Geocentric 
Reference  System  for  the  Americas  (SIRGAS2000)  and 
to  the  International  Terrestrial  Reference  Frame  (ITRF). 
According  to  HOFMANN  et  al.  (2007)  [1],  the  technique 
used  to  determine  the  coordinates  by  the  PPP  method  uses 
a  mathematical  adjustment  by  the  criterion  of  least 
squares  (MMQ)  and  provides  statistical  indicators  on  the 
precision  of  the  solution  found  in  the  adjustment. 

As  it  is  known,  accuracy  is  different  from  precision  and 
for  this  reason  there  is  some  risk  in  assuming  the 
coordinates  that  result  from  PPP  processing  based  only  on 
its  precision.  In  many  situations,  the  coordinates 
determined  with  very  high  precision  do  not  have  good 
accuracy  and,  therefore,  do  not  represent  the  tme  point 
position  on  the  Earth’s  physical  surface.  This  happens 
initial  data  acquired  by  the  receivers  contain  perturbations 
of  some  kind,  such  as  the  multipath  influence,  which 
according  to  MONICO  (2008)  [2],  is  a  local  interference 
capable  of  degrading  the  observables  of  the  phases  and  of 
the  codes  and  producing  the  coordinates  from  a  point 
certainly  far  from  their  real  position  on  the  ground.  Thus, 
the  study  presented  here  was  developed  to  find  a  way  to 
indirectly  estimate  how  different  are  the  precision  and 
accuracy  of  a  PPP -GNSS  positioning  solution. 
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This  paper’s  main  hypothesis  is  that  once  the  correlation 
between  accuracy  and  precision  of  a  significant  set  of 
GNSS  data  is  known,  it  becomes  possible  to  predict  the 
accuracy  of  a  new  measurement,  based  on  its  precision, 
using  the  computational  technique  of  Machine  Learning 
known  as  Decision  Tree. 

1. 1  The  Precise  Point  Positioning  Technique  (PPP) 

PPP  is  a  method  in  which  the  position  coordinates  of  the 
receiver  are  calculated  directly  in  function  of  the  position 
coordinates  of  the  satellites.  This  is  an  absolute 
positioning  method  and  for  this  reason  PPP  is  also  known 
as  a  Precise  Absolute  Positioning  method.  The 
georeferenced  coordinates  obtained  with  this  method  are 
not  associated  to  any  planimetric  network,  or  to  any 
existing  altimetric  network  on  the  Earth’s  surface,  and  for 
this  reason,  according  to  IBGE  (2013)  [3],  the  PPP 
coordinates  can  present  significant  differences  regarding 
the  vertices  of  these  terrestrial  networks.  In  other  words, 
coordinates  determined  with  the  PPP  may  present 
unacceptable  accuracy.  PPP  is  a  method  similar  to  simple 
absolute  positioning,  but  it  is  not  the  same,  as  there  are 
some  fundamental  differences.  One  remarkable  difference 
is  that  the  coordinates  of  the  receiver  are  calculated  in  the 
PPP  from  the  precise  ephemeris  available  in  the  IGS 
network  or  other  similar  institution.  It  is  an  expressive 
difference  compared  to  the  simple  absolute  positioning 
method  that  uses  the  broadcasted  ephemeris  transmitted 
by  the  satellites.  In  the  PPP  calculation,  the  movement  of 
tectonic  plates,  the  ground  tides,  the  satellite  clock  errors, 
the  receiver  clock  errors,  the  offsets  of  the  antenna  center 
of  the  satellite  and  the  phase  center  of  the  receiver 
antenna  are  considered  to  get  coordinates  with  good 
accuracy.  Another  important  difference  is  that  the  PPP 
method  also  uses,  in  addition  to  C/A  Code  data,  the  LI 
and  L2  carrier  phase  data,  which  requires  the  user  to  use  a 
dual  frequency  receiver.  It  is  only  with  this  type  of 
receiver  that  the  necessary  data  is  obtained  to  model  the 
ionosphere  and  to  develop  the  model  known  as 
ionosphere -free,  or  ionofree,  which  according  to  XU 
(2016)  [4],  eliminates  the  effects  of  the  ionosphere  by  the 
combination  of  the  codes  and  carrier  phases  equations.  It 
is  a  linear  combination  of  data,  extremely  useful  for 
eliminating  the  errors  produced  by  the  ionospheric 
refraction,  when  the  signals  cross  the  Earth’s  atmosphere 
heading  to  receiver.  SILVA  and  SEGANTINE  (2015)  [5] 
estimate  the  precision  of  the  PPP  method  in  the  order  of  5 
to  10  cm,  although  some  tests  show  that  it  can  reach  2  to 
5  cm  precision,  especially  when  the  collecting  data  time  is 
more  than  two  hours  and  there  is  a  convergence  of  results. 
The  PPP  method  began  to  be  offered  in  Brazil  by  the 
Brazilian  Institute  of  Geography  and  Statistics  (IBGE) 
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around  the  year  of  2000,  through  the  link 
http://www.ppp.ibge.gov.br/ppp.htm. 

Strictly  speaking,  the  PPP  method  can  also  be  applied  to 
data  collected  with  single  frequency  receivers,  which  can 
only  acquire  data  from  a  single  carrier.  In  this  case,  some 
mathematical  resources  are  applied  to  model  the 
ionosphere,  since  it  is  not  possible  to  combine  the  carriers 
phases.  We  did  not  deal  with  this  case  in  this  study. 

1.2  Decision  Tree 

Machine  Learning  is  a  characteristic  of  a  computer 
system  training  using  a  large  amount  of  data  to  leant  how 
to  execute  a  certain  taskand  execute  it  at  other  times  with 
better  performance.  WITTEN  and  FRANK  (2005)  [6] 
understand  that  the  system  modifies  itself  and 
automatically  leams  about  a  certain  event,  allowing  a  task 
from  the  same  group  of  tasks  to  be  more  effectively 
perfonned  the  next  time.  It  is  a  process  that  automatically 
or  semiautomatically  identifies  the  patterns  implicit  in 
large  amounts  of  data. 

Due  to  this  capacity,  Machine  Learning  techniques  are 
increasingly  used  to  deal  with  problems  of  great 
complexity  and  difficult  to  conceptualize  in  different 
areas,  such  as  in  Mathematics,  Medicine,  Biology,  and 
Engineering.  ZHAN-LI  et  al.  (2015)  [7]  demonstrated  that 
this  process  is  able  to  identify  and  synthetically  recover 
three-dimensional  points  lost  during  the  capture  of  a 
sequence  of  video  images,  in  a  process  conceptually  very 
close  to  the  classification  of  points  determined  by  the  PPP 
method.  Among  the  current  Machine  Learning  techniques 
are: 

1) The  Neural  Network,  or  Multi  Layer  Perceptron 
network,  indicated  for  multiple  classification  of  events,  in 
which  the  number  of  learning  examples  is  typically  large. 

2) The  algorithm  of  Support  Vector  Machines,  extremely 
fast,  but  with  the  disadvantage  of  solving  only  binary 
problems,  involving  only  two  classes. 

3) The  Decision  Tree,  designed  to  work  with  an  unlimited 
number  of  multivariate  data  that  serve  as  test  examples  in 
the  training  stage.  It  also  has  the  ability  to  interpret  and 
understand  the  implicit  rules  in  this  dataset,  and  then  uses 
these  rules  in  a  prediction  process  able  to  create  infinite 
classes  that  will  be  used  to  classify  a  new  event  by  the 
similarity  of  its  characteristics  compared  to  the 
characteristics  of  the  examples  used  in  the  training  stage. 

The  accuracy  of  the  coordinates  obtained  using  GNSS, 
especially  using  the  PPP  method,  can  be  understood  as  a 
complex  problem  since  the  PPP  is  dissociated  from 
existing  geodesic  networks  on  the  surface  of  the  Earth. 

For  this  reason,  the  Decision  Tree  is  an  appropriate  tool  to 
clearly  explain  the  implicit  positioning  accuracy  in  the 
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observed  data.  All  measurements  made  by  GNSS  are 
made  up  of  different  variables  that  can  be  modeled,  such 
as:  observables,  recording  rate,  collection  time, 
ionospheric  disturbance,  and  tropospheric  disturbance  all 
items  that  can  be  analyzed  by  a  Decision  Tree.  According 
to  LEVINE  et  al.  (1988)  [8],  a  Decision  Tree  is  induced 
(created)  from  a  reliable  database,  a  data  stmcture 
constituted  recursively  by:  decision  nodes  which 
correspond  to  a  test  on  a  variable  and  leaf  nodes,  which 
correspond  to  the  resulting  classes,  as  shown  in  Figure  1. 
To  classify  a  measurement  consisting  of  GNSS 
observables,  the  process  begins  at  the  root,  following  to 
each  test  node  until  the  decision  leaf  is  reached,  at  which 
point  the  classification  takes  place. 

Each  Decision  Tree  can  be  represented  by  a  set  of  mles, 
in  which  each  rule  begins  at  the  root  of  the  tree  and  walks 
to  one  of  its  leaves.  like  any  other  automated  and 
repetitive  procedure  the  Decision  Tree  presents 
advantages  and  disadvantages.  Among  the  advantages 
some  can  be  highlighted: 

1)  The  Decision  Tree  is  easily  created  and  intelligible. 

2)  Does  not  require  “a  priori”  definitions  for  any 
parameter  of  the  data  under  analysis. 

3)  The  number  of  examples  used,  the  quality  of  the 
database,  and  the  intensity  of  the  training  control  in  the 
decision  tree  generating  algorithms  are  considered  to  be 
unstable  and  sensitive  to  variations  in  the  training  data. 
This  minimizes  weak  results  at  the  decision  points  of  the 
tree  (decision  nodes)  and  prevents  inference  errors  from 
spreading  to  all  subsequent  branches. 

4)  The  Decision  Tree  allows  for  simultaneous 
classification  of  alpha  data,  numerical  data  and 
alphanumeric  data,  with  the  condition  that  the  output 
attribute  is  always  an  alpha  class. 

After  being  recursively  trained,  a  Decision  Tree  produces, 
as  a  result,  the  stratification  of  data  in  the  form  of  classes. 
According  to  RICH  &  KNIGHT  (1991)  [9],  classification 
is  an  important  component  for  solving  many  problems, 
being  in  its  simplest  form  considered  as  a  direct  task  of 
recognition.  From  the  point  of  view  of  machine  learning, 
the  act  of  classifying  is  the  process  of  assigning  to  a  given 
data  the  name  and  class  to  which  it  belongs.  Previously  to 
the  classification  some  tasks  had  to  be  carried  out  for  the 
Decision  Tree  induced  in  this  study  to  classify  the 
accuracy  of  the  solutions  of  new  positioning  points.  First, 
a  set  of  coordinates  with  known  precision  and  accuracy 
was  organized  and  the  Decision  Tree  was  intensively 
trained  based  on  this  data  until  it  established  the  intrinsic 
inference  rules  contained  in  them  and  in  that  way,  the  tree 
became  able  to  perform  the  classification. 
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Fig.  1 :  Decision  Tree  Conceptual  Structure. 


II.  MATERIALS:  STUDY  AREA  AND  DATA  SET 

In  this  paper,  a  reference  database  composed  of  a 
multivariate  dataset  was  prepared.  This  dataset  was  used 
to  create  the  Decision  Tree  and  then  make  it  able  to  make 
the  predictions  about  the  accuracy  of  results.  The  data  of 
the  reference  bank  were  acquired  from  three  geodesic 
stations,  located  in  the  state  of  Sao  Paulo,  according  to 
Figure  2. 


o 

o 

SPBO 

o 

Fig. 2:  Geodesic  stations  used. 


These  stations  belong  to  the  Brazilian  GNSS  Systems 
Continuous  Monitoring  Network  (RBMC),  managed  by 
the  Brazilian  Institute  of  Geography  and  Statistics 
(IBGE).  More  precisely  the  geodesic  stations  are  detailed 
as  follows: 

-EESC  station,  code  99560,  with  official  coordinates 
published  by  IBGE  as  being  latitude  (□):  22°  00  '17,8160 
"S,  longitude  (□):  47°  53'  57,0497"  W,  and  geometric 


height  (h):  824,587  m,  fixed  in  a  metal  tower  on  the 
ceiling  of  the  School  of  Engineering  of  Sao  Carlos, in  the 
city  of  Sao  Carlos  (SP),  Brazil,  where  a  double  frequency 
Leica  GR10  GNSS  receiver  opeerates . 

-SPBO  station,  code  99537,  with  official  coordinates 
published  by  IBGE  as  being  latitude  (□):  22°  51  '08.8825 
”S,  longitude  (□):  48°  25'  56.282"  W,  and  geometric 
height  (h):  803.122  m,  fixed  in  a  cylindrical  pillar  on  the 
slab  next  to  the  Didactic  Laboratory  of  Topography  and 
Remote  Sensing  of  the  Department  of  Rural  Engineering 
of  the  Faculty  of  Agronomic  Sciences  of  UNESP,  in  the 
city  of  Botucatu  (SP),  Brazil,  where  a  double  frequency 
GNSS  receiver  Leica  GRX  1200  plus  operates. 

-Station  SPC1,  code  96181,  with  official  coordinates 
published  by  IBGE  as  being  latitude  (□):  22°  48  '58.6305 
"S,  longitude  (□):  47°  03'  45.6958"  W,  and  geometric 
height  (h):  622.980  m  fixed  in  a  concrete  cylinder  at  the 
top  of  the  building  of  the  Department  of  Geotechnics  and 
Transportation,  Faculty  of  Civil  Engineering  of  Unicamp, 
in  the  city  of  Campinas  (SP),  Brazil,  where  a  double 
frequency  GNSS  Trimble  NETR9  receiver  operates . 

At  these  stations,  the  GNSS  data  were  acquired  from 
lanuary  to  May  2016,  with  intervals  spaced  every  15 
days,  which,  after  being  processed  by  the  PPP  method 
were  used  to  compose  the  reference  database. 

As  all  data  observed  at  RBMC  geodesic  stations  were 
stored  in  24-hour  continuous  files  and  due  to  that,  it  was 
necessary  to  extract  several  files  with  3  hours  of  data  each 
day  of  the  study.  This  was  done  because  according  to 
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IBGE  (2013)  [3],  the  result  of  a  PPP  positioning 
converges  after  two  hours  of  stored  data,  and  one  of  this 
study’s  objectives  was  to  analyze  one  hour  of  data  with 
the  same  convergence  pattern. 

Therefore,  the  first  file  of  the  day  contains  data  from  4:00 
a.m.  to  7:00  a.m.,  the  second  file  from  5:00  a.m  to  8:00 
a.m.,  and  the  last  file  of  the  day  contains  data  lfom3p.ni 
to  6  p.ni  In  this  way,  12  files  were  prepared  each  day, 
covering  the  daytime  period  from  4:00  a.m.  to  6:00  p.ni, 
which  is  considered  business  time,  when  most  of  the 
companies  that  work  with  georeferencing  activities 
acquire  their  data,  which  shall  be  used  in  engineering 
services. 

This  form  of  organization  allowed  for  preparation  of  132 
observation  sessions  in  each  geodesic  station,  totaling  396 
study  sessions,  which  were  used  in  the  creation  and 
training  of  the  Decision  Tree. 

According  to  the  instructions  of  the  PPP -IBGE  manual, 
each  three-hour  file  was  submitted  online  at 
http://www.ibge. gov. br/home/geociencias/geodesia/ppp/d 
efault.shtm,  in  which  the  data  of  the  LI  and  L2  carrier 
phases  transmitted  by  the  satellites  of  the  GPS  and 
GLONASS  constellations  were  processed,  with  a  mask  of 
elevation  higher  than  10°,  by  the  PPP  method.  Each 
processed  file  has  produced  several  relevant  information 
to  the  interpretation  and  analysis  of  the  positioning 
results,  among  which  are  the  coordinates  of  the  point,  its 
precision  and  other  17  variables,  described  below: 

1.  Precision  of  the  PPP  solution  in  latitude; 

2.  Precision  of  the  PPP  solution  in  longitude; 

3.  Precision  of  the  PPP  solution  at  geometric  height; 

4.  Number  of  GPS’s  processed  epochs; 

5.  Number  of  GPS’s  rejected  epochs; 

6.  Residues  of  GPS  Pseudodistances; 

7.  Residues  of  the  GPS  carriers  phases; 

8.  Number  of  GLONASS’s  processed  epochs; 

9.  Number  of  GLONASS’s  rejected  epochs; 

10.  Residues  of  GLONASS  Pseudodistances; 

11.  Residues  of  GLONASS  carrier  phases; 

12.  Percentage  of  GPS’s  rejected  epochs; 

13.  Percentage  of  GLONASS’s  rejected  epochs; 

14.  Accuracy  of  the  solution  according  to  latitude; 

15.  Accuracy  of  the  solution  according  to  longitude; 

16.  Accuracy  of  the  solution  according  to  geometric 
height;  and, 

17.  Accuracy  class. 

Strictly  speaking,  PPP-IBGE  processing  provides  the 
eleven  first  variables  and  the  six  final  variables  are 
obtained  by  crossing  the  data.  The  accuracy  of  the 
coordinates,  for  instance,  was  obtained  by  comparing  the 
measured  coordinates  with  the  known  coordinates  of  each 
geodesic  station.  This  was  done  to  highlight  the  important 


points  in  the  Decision  Tree  training,  which  were  the 
percentage  of  rejected  GNSS  epochs,  both  GPS  and 
GLONASS,  whose  proportion  has  a  direct  relationship 
with  the  precision  of  the  positioning  result. 

2.1  Decision  Tree  Induction  Software 
To  interpret  the  17  variables  produced  in  PPP  processing 
and  to  identify  how  they  are  related,  we  used  the  open 
software  developed  by  Professors  Ian  H.  Witten  and  Eibe 
Frank  of  the  University  of  Waikato,  New  Zealand,  known 
as  WEKA  (Waikato  Environment  Knowledge  Analysis), 
version  3.8. 


This  software  was  chosen  due  to  its  capability  of  working 
with  large  volumes  of  data  and  for  offering  different 
Machine  Learning  techniques,  including  Decision  Trees. 
The  software  facilitated  the  constmction  of  several 
decision  trees,  such  as  the  example  above,  created  for  this 
study  until  it  reached  the  appropriate  version  to  carry  out 
the  classification. 

2.2  Accuracy  Classes 

During  the  computational  training  of  a  Decision  Tree,  the 
computational  system  creates  the  classification  mles  from 
the  known  situation  to  predict  new  events.  For  this  reason 
the  Decision  Tree  needs  to  be  instructed  about  the  interval 
of  each  class  to  be  considered. 

Working  with  geodesic  stations  that  have  known 
coordinates,  it  is  always  possible  to  identify  the  quality  of 
the  positioning  of  the  PPP  method.  Making  a  comparison 
between  the  coordinates  determined  in  the  PPP  and  the 
known  coordinates  enables  the  establishment  of  the 
classes  and  their  amplitudes  which  must  be  respected  in 
the  results  predicting  process.  In  this  study,  the  following 
accuracy  classes  were  defined  for  the  training  of  the  trees: 
CLASS  ACCURACY 

A  0.0  to  2.0  cm 

B  2.1  to  4.0  cm 

C  4.1  to  6.0  cm 

D  6.1  to  8.0  cm 

Z  >  8.0  cm. 


www.ijaers.com 


Page  |  122 


International  Journal  of  Advanced  Engineering  Research  and  Science  (IJAERS)  [Vol-S,  Issue-12,  Dec-  2018] 

httos://dx.doi.ora/10.22161/iiaers.5.12.16  ISSN:  2349-649S(P)  /  2456-1908(0) 


The  reference  bank  used  to  carry  out  the  Decision  Tree 
training  used  only  the  known  information  in  the  three 
geodesic  stations,  thus  being  the  known  reference  in  the 
process.  It  was  populated  by  the  396  daily  measurement 
sessions,  each  containing  the  17  mentioned  attributes .  The 
accuracy  class  known  in  each  case  was  classified  by  the 
researcher  and  became  the  18th  attribute  in  the  database. 
The  following  figure  shows  the  implicit  accuracy  in  the 
Decision  Tree  training  data.  This  figure  also  shows  that 
the  user,  when  working  only  accurately,  is  not  aware  of 
the  accuracy  of  the  result,  being  exposed  to  the  risk  of 
adopting  as  reliable  some  sets  of  coordinates  that  are  very 
distant  from  the  real  position  of  the  point  on  the  ground. 
The  Decision  Tree  interprets  what  really  matters  in  a 
positioning,  which  is  the  accuracy  of  the  result. 


III.  VALIDATION  TEST 

Whenever  the  Decision  Tree  is  triggered  to  classify  the 
accuracy  of  a  new  PPP  positioning  solution  from  which  it 
only  knows  the  precision,  it  follows  the  mles  implicit  in 
the  reference  bank,  identifies  the  links  that  may  exist 
among  the  17  variables  of  this  new  measurement,  and 
makes  the  prediction  about  the  18th  variable,  which  is  the 
accuracy  of  the  new  solution,  something  still  unknown,  as 
of  WITTEN  and  FRANK  (2005)  [6].  As  it  is  an 
inferential  method,  there  are  always  probabilities  of  errors 
directly  associated  with  the  quality  of  the  reference  bank. 
To  verify  the  quality  of  the  predictions  made  by  the 
Decision  Tree,  a  specific  stage  was  developed  to  validate 
its  predictions . 


To  carry  out  the  validation  test,  a  new  set  of  GNSS  data 
could  be  acquired  at  any  randomly  chosen  new  location 
inside  the  triangle  in  question,  in  a  different  location  from 
the  EESC,  SPBO  and  SPC1  geodesic  stations,  which  had 
already  been  used  in  the  training  stage.  If  the  chosen 
location  was  a  place  without  any  control  we  would  only 
have  to  accept  the  prediction  made  by  the  Decision  Tree 
without  means  to  verify  the  quality  of  its  prediction, 
which  is  the  object  of  the  validation. 

To  know  exactly  how  the  Decision  Tree  classified  the 
new  data,  we  decided  to  use  a  fourth  RBMC’s  geodesic 
station,  located  inside  the  territorial  area  formed  by  the 
three  initial  ones.  The  classification  made  by  the  Decision 
Tree  over  the  collected  data  in  this  4th  station  was  used  to 
validate  the  level  of  quality  of  the  made  predictions.  The 
chosen  station  for  the  validation  test  was: 

-SPPI  station,  code  99588,  with  official  coordinates 
published  by  IBGE  as  being  latitude  (□):  22°  42  '10.9769 
"S,  longitude  (□):  47°  37'  25.0333"  W  and  geometric 
height  (h):  561.88  nr  fixed  in  a  concrete  cylinder  built  at 
the  USP/ESALQ  Meteorological  Station,  in  the  city  of 
Piracicaba  (SP),  Brazil,  where  a  double  frequency  GNSS 
Trimble  NETR8  receiver  operates. 

The  data  acquired  at  this  station  was  used  to  organize  33 
observation  sessions  scattered  from  January  to  May  2016, 
but  on  different  days  from  those  used  in  the  composition 
of  the  reference  bank.  The  data  files  of  each  session  were 
organized  in  the  same  way  as  the  files  used  in  the 
training,  i.e.,  three  hours  each,  a  period  sufficient  for  the 
convergence  of  data  in  PPP  processing. 

These  33  files  were  sent  online  for  PPP  processing  on  the 
IBGE  website.  The  table  below  shows  the  differences 
found  in  latitude  (□),  longitude  (□),  and  geometric  height 
(m)  values,  comparing  the  calculated  coordinates  in  each 
session  with  the  known  coordinates  of  the  SPPI  station, 
differences  that  inform  the  actual  accuracy  of  each 
session.  The  last  column  on  the  right  presents  the 
accuracy  prediction  made  by  the  Decision  Tree,  through 
alphabetic  characters:  A,  B,  C,  D  and  Z,  which  represent 
the  class  estimated  for  each  session,  according  to  item 
2.2. 
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Table.  1 :  Accuracy  Classification  made  by  the  Decision  Tree  in  the  Validation  Test 


Session 

Differences 

Accuracy 

Session 

Differences 

Accuracy 

Latitude 

Ai|)(m) 

Longitude 

AX(m) 

height 

Ah(m) 

Real 

(m) 

Predicted 

(Classe) 

Latitude 

A(|)(m) 

Longitude 

AA<(m) 

Aitura 

Ah(m) 

Real 

(m) 

Predicted 

(Classe) 

1 

-0.03 

-0.01 

-0.03 

0.04 

B 

18 

-0.01 

0.00 

-0.06 

0.06 

C 

2 

-0.02 

0.01 

-0.04 

0.05 

C 

19 

-0.02 

0.02 

-0.06 

0.06 

C 

3 

-0.02 

0.02 

-0.04 

0.05 

C 

20 

-0.02 

0.01 

-0.05 

0.05 

c 

4 

-0.01 

0.03 

-0.06 

0.07 

Z 

21 

-0.02 

0.01 

-0.04 

0.04 

B 

5 

-0.01 

-0.01 

-0.03 

0.03 

B 

22 

-0.01 

0.01 

-0.02 

0.02 

A 

6 

-0.01 

0.00 

-0.06 

0.06 

C 

23 

-0.01 

0.02 

-0.05 

0.05 

C 

7 

-0.02 

0.01 

-0.08 

0.08 

Z 

24 

-0.02 

0.01 

-0.01 

0.02 

A 

8 

-0.02 

0.01 

-0.05 

0.05 

C 

25 

-0.02 

0.02 

-0.05 

0.06 

C 

9 

-0.01 

0.01 

-0.09 

0.09 

z 

26 

-0.02 

0.02 

-0.03 

0.04 

B 

10 

-0.03 

0.00 

-0.09 

0.09 

z 

27 

-0.03 

0.01 

-0.03 

0.04 

B 

11 

-0.02 

0.00 

-0.04 

0.04 

B 

28 

-0.01 

0.01 

-0.04 

0.04 

B 

12 

-0.02 

0.01 

-0.06 

0.06 

C 

29 

-0.02 

0.01 

-0.03 

0.04 

B 

13 

-0.03 

-0.01 

0.01 

0.04 

B 

30 

-0.02 

0.01 

-0.05 

0.05 

C 

14 

-0.01 

-0.01 

-0.04 

0.04 

B 

31 

-0.02 

0.00 

-0.04 

0.04 

B 

15 

-0.01 

-0.01 

-0.06 

0.06 

C 

32 

-0.02 

0.01 

-0.04 

0.05 

C 

16 

-0.02 

0.02 

-0.05 

0.06 

C 

33 

-0.02 

0.01 

-0.04 

0.05 

c 

17 

-0.02 

-0.02 

-0.07 

0.07 

Z 

- 

- 

- 

- 

- 

At  this  point,  it  should  be  remembered  that  in  the  training 
stage,  17  attributes  were  used  in  each  new  measurement 
session,  the  latter  being  precisely  the  classification  of  the 
accuracy  of  the  coordinates  known  in  that  stage.  This 
point  is  emphasized  because  now,  in  the  validation  step, 
each  instance  representing  a  measurement  session  was 
organized  with  only  16  attributes,  leaving  the  17th 
attribute,  concerning  accuracy,  for  the  Decision  Tree  to 
make  its  own  prediction. 

All  new  instances  could  be  validated  because  the  SPPI 
station  has  known  coordinates.  The  validation  test 
reached  a  result  with  29  correct  predictions  in  a  universe 
of  33  predictions,  which  puts  the  degree  of  accuracy  in 
this  work  at  88%,  a  little  above  the  initial  expectation, 
which  gave  the  Decision  Tree  a  confidence  level  of  86%. 

IV.  CONCLUSIONS 

As  predicted  by  WITTEN  and  FRANK,  (2005)  [6],  the 
results  obtained  in  the  creation  of  the  Decision  Tree 
proved  to  be  better  as  we  introduced  cross -data  and  not 
only  the  initial  data.  Variables  number  12  and  13  were 
introduced  to  explain,  respectively,  the  proportion  of 
rejected  GPS  epochs  and  the  proportion  of  rejected 
GLONASS  epochs,  in  addition  to  making  evident  the 
degree  of  participation  of  each  positioning  system  for  the 
final  result.  In  addition,  these  variables  show  the 
proportion  of  each  system's  data  utilization  individually, 
which  helped  the  Decision  Tree  to  be  better  conditioned 
for  future  interpretations. 

It  has  been  confirmed  that,  in  fact,  the  precision  of 
measurements  made  with  GNSS  is  something  very 
different  from  accuracy.  Figure  4  presents  this  difference 
very  clear.  In  addition,  the  figure  shows  that  the 


relationship  between  accuracy  and  precision  is  not 
deterministic  and,  therefore,  each  positioning  result  has  to 
be  monitored  individually,  otherwise  a  bad  result  may  be 
accepted  as  good.  The  Decision  Tree  is  a  tool  that  allows 
the  user  to  anticipate  the  correlation  between  both 
accuracy  and  precision. 

The  data  processing  by  the  PPP-GNSS  method  reached, 
in  this  study,  an  accuracy  of  decimetric  order,  as  already 
estimated  by  SILVA  and  SEGANTINE  (2015)  [5],  This 
level  of  quality  puts  the  method  in  equal  conditions  to 
other  methods  of  precise  positioning  and  in  a  much  better 
condition  than  was  initially  assumed. 

The  obtained  results  are  satisfactory  and  completelly 
within  the  expected  range,  since  they  showed  a  behavior 
very  similar  to  each  other,  both  for  the  set  of  precisions 
and  the  set  of  accuracies.  Only  4  values  of  accuracy  did 
not  follow  the  behavior  of  the  tests  group  in  the  396 
measurement  sessions,  although  they  resulted  in  values 
better  than  2  centimeters,  which  is  not  significant  for  the 
study,  according  to  LINOFF  and  BERRY  (2011)  [10]. 
From  the  results  in  this  study,  which  used  six  months 
period  of  data  to  show  the  accuracy  of  coordinates  as  a 
greater  parameter  of  importance  than  precision,  it  can  be 
concluded  that: 

It  was  clearly  demonstrated  that  accuracy  is  something 
different  from  precision,  which  accompanies  the 
coordinates  calculated  by  any  GNSS  positioning  method, 
including  the  PPP-GNSS  method.  It  can  be  proven  by  the 
distance  between  them  in  Figure  2. 

The  396  measurement  sessions  used  to  create  and  train 
the  Decision  Tree  showed  a  correlation  between  precision 
and  accuracy  in  the  GNSS  data,  suggesting  that  there  may 
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be  one  or  more  connection  rules  between  them,  which 
needs  to  be  investigated. 

As  a  support  tool,  the  Decision  Tree  can  be  applied  in  the 


[10]  Iinoff,  G.  S.;  Berry,  M.J.A..  2011.  “Data  Mining 
Techniques”.  Third  Edition,  Wiley  Publishing  Inc.  - 


U.SA.. 


investigation  of  accuracy  obtained  with  other  GNSS 
positioning  methods,  since,  regardless  of  the  method 
applied  to  get  the  solution,  the  result  of  any  GNSS 
measurement  session  are  the  coordinates  of  the  measured 
point,  always  accompanied  by  the  statistical  indicator  of 
precision,  an  element  used  as  variable  in  this  study. 
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